227 research outputs found

    Automatic Measurement of Memory Hierarchy Parameters

    Full text link
    On modern computers, the running time of many applications is dominated by the cost of memory operations. To optimize such applications for a given platform, it is necessary to have a detailed knowledge of the memory hierarchy parameters of that platform. In practice, this information is usually poorly documented if at all. Moreover, there is growing interest in self-tuning, autonomic software systems that can optimize themselves for different platforms, and these systems must determine memory hierarchy parameters automatically without human intervention. One solution is to use micro-benchmarks to determine the parameters of the memory hierarchy. In this paper, we argue that existing micro-benchmarks are inadequate, and present novel micro-benchmarks for determining the parameters of all levels of the memory hierarchy, including registers, all caches levels and the translation look-aside buffer. We have implemented these micro-benchmarks into an integrated tool that can be ported with little effort to new platforms. We present experimental results that show that this tool successfully determines memory hierarchy parameters on many current platforms, and compare its accuracy with that of existing tools

    Automated Application-level Checkpointing of MPI Programs

    Full text link
    Because of increasing hardware and software complexity, the running time of many computational science applications is now more than the mean-time-to-failure of high-performance computing platforms. Therefore, computational science applications need to tolerate hardware failures. In this paper, we focus on the stopping failure model in which a faulty process hangs and stops responding to the rest of the system. We argue that tolerating such faults is best done by an approach called application-level coordinated non-blocking checkpointing, and that existing fault-tolerance protocols in teh literature are not suitable for implementing this approach. In this paper, we present a suitable protocol, and show how it can be used with a precompiler that instruments C/MPI programs to save application and MPI library state. An advantage of our approach is that it is independent of the MPI implementation. We present experimental results that argue that the overhead of using our system can be small
    • …
    corecore